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ABSTRACT 


In this paper, we are discussing the basic concepts and fundamentals of Natural 
Language Generation, a field in Natural Language Engineering that deals with the 
conversion of non-linguistic data into natural information. We will start our 
investigation by introducing the NLG system and its different types. We will also 
pin point the major differences between NLG and NLU also known as Natural 
Language Understanding. Afterwards, we will shed the light on the architecture 
of a basic NLG system, its advantages and disadvantages. Later, we will examine 
the different applications of NLG, showing a case study that illustrates how an 
NLG system operates from an algorithmic point of view. Finally, we will review 
some of the existing NLG systems together with their features, taken from the 
real world. 
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I. INTRODUCTION 

NLG or Natural Language Generation is the process of 
constructing natural language outputs from non- 
linguistic inputs. One of the central goals of NLG is to 
investigate how computer programs can be made to 
produce high-quality, expressive, uncomplicated, and 
natural language text from computer-internal 
sophisticated representations of information [1]. 

II. NLG vs. NLU 

NLG is the inverse of NLU (Natural Language 
Understanding) or NLI (Natural Language 
Interpretation), in that NLG maps from meaning to 
text; while, NLU maps from text to meaning [2]. NLG 
is easier than NLU because a NLU system cannot 
control the complexity of the language structure it 
receives as input while NLG links the complexity of 
the structure of its output. Table 1 delineates the 
differences between NLG and NLU. 


Table 1 - NLG vs. NLU 


NLG 

NLU 

Relatively Unambiguous 

Ambiguity in input 

Well-formed 

ill-formed input 

Well-specified 

Under-specification 


III. NLG SYSTEM 

> Goal: Computer software which produces 
understandable and appropriate texts in English 
or other human languages. 

> 


> Input: some underlying non-linguistic 

representation of information. 

> Output: Documents, reports, explanations, help 
messages, and other kinds of texts. 

> Knowledge sources required: knowledge of 
language and of the domain. 

IV. TYPES OF NLG SYSTEMS 

There exist different types of NLG systems starting 
with the simplest ones - the canned text and template 
filling systems, to end with sophisticated systems that 
adapt to realistic changes and variations in the 
information of a particular domain [3]. 

A. Canned Text 

The process to generate text can be as simple as 
keeping a list of canned text that is copied and pasted, 
possibly linked or concatenated with some glue text. 
The results may be satisfactory in simple domains 
such as horoscope machines or generators of 
personalized business letters. Canned Text NLG type 
systems are easy to implement, but are unable to 
adapt to new situations without the intervention of a 
programmer [4]. 

B. Template Filling 

In this approach, you fill a template by entering data 
into slots and fields, and a natural statement is 
generated. Junk mail is generated using template 
filling systems in which a mail is sent with addressee 
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name in the right place. Template filling is easy to 
implement but not flexible enough to handle 
applications with any realistic variation in the 
information being expressed or in the context of its 
expression. Figure 1 depicts the architecture of a 
template filling type NLG system. 



Figure 1 - Template Filling NLG System 


C. Advanced NLG Systems 

As stated previously, canned text and template filling 
systems are not that flexible to deal with emerging 
situations and real word problems. Therefore, new 
NLG systems were investigated in order to solve 
complex and advanced problems. Those new NLG 
systems must take the following choices [5]: 

> Content Selection: The system must choose the 
appropriate content to express and generate 
natural output based on a specific communicative 
goal. 

> Lexical Selection: The system must choose the 
lexical items most appropriate for expressing 
particular concepts. 

> Sentence Structure Aggregation: The system must 
generate phrases, clauses and sentence-sized 
chunks. 

> Discourse Structure: The system must deal with 
multi-sentence discourse which has a coherent 
structure. 

V. NLG SYSTEM ARCHITECTURE 

A modern architecture for NLG systems comprises a 
knowledge base, a discourse planner, and a surface 
realizer. The discourse planner selects from a 
knowledge pool which information to include in the 
output, and creates a text structure to ensure 
coherence. On a more local scale, the planner process 
the content of each sentence and orders its parts. The 
surface realizer is fed by the discourse specification in 
order to convert sentence-sized chunks of 
representation into grammatically correct sentences 
[6]. Figure 2 shows the basic architecture of an NLG 
system. 



Figure 2 - NLG System Architecture 

A. Knowledge Base 


It contains all information of a specific domain. It is a 
large general-purpose knowledge base that acts as 
support for domain-specific application which would 
help to speed up and enhance generator porting and 
testing on new applications. 

B. Communicative Goal 

It designates the intended audience who is going to 
use the system. The stylistic variations serve to 
express significant interpersonal and situational 
meanings (text can be formal or informal, slanted or 
objective, colorful or dry, etc.) 

C. Discourse Planner 

It selects the content from the knowledge base and 
then structures that content appropriately. The result 
is a specification for all choices made for the entire 
communication, potentially spanning multiple 
sentences and including other annotation. In other 
words the discourse planner takes a specified input 
and generates linear chunks of information. The two 
approaches used by discourse planners are Text 
Schemata and Rhetorical Relations [7]. 

D. Text Schemata 

It is a mechanism based on expressing expressions as 
different high-level procedures similar to states in 
order to structure the output. 

E. Rhetorical Relations 

It is based on RTS (Rhetorical Structure Theory) 
which designates a central segment of text called 
nucleus and a more peripheral segment called the 
satellite. RST relations are defined in terms of the 
constraints they place on nucleus, on the satellite and 
on the combination of the nucleus and satellite [8]. 

F. Surface Realizer 

It receives the fully specified discourse plan and 
generates individual sentences as contained by its 
lexical and grammatical resources. In other words the 
surface realizer converts text specifications into 
actual natural text. The different linguistic 
realizations involved in surface realization process 
are the following: 

> Insert function words 

> Choose correct inflection of content words 

> Order words within a sentence 

> Apply orthographic rules 

The two approaches used by surface realizers are 
Systemic Grammar and Functional Unification 
Grammar. 

G. Systemic Grammar 

It represents sentences as collections of functions and 
maintains rules for mapping those functions onto 
explicit grammatical forms. In Table 2, the one who is 
doing the action is the subject I and the action (verb) 
or the process being committed by the actor is eat and 
finally the object acted upon is the sandwich [9]. 


Table 2 - Systemic Grammar Example 


Sentence 

I 

eat 

sandwich 

Mood 

Subject 

Predictor 

Object 

Transitivity 

Actor 

Process 

Goal 


H. Functional Unification Grammar 
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It is based on features grammar where the basic idea 
is to build the generation grammar as a feature 
structure with a list of all possible alternations and 
then unify this grammar with an input specification 
built using the same sort of feature structure. 

VI. APPLICATIONS OF NLG SYSTEMS 

> Database Content Display: 

The description of database contents in natural 
language is not a new problem, and some such 
generators already exist for specific databases. The 
general solution still poses problems, however, since 
even for relatively simple applications it still includes 
unsolved issues in sentence planning and text 
planning. 

> Expert System Explanation: 

This is a related problem, often however requiring 
more interactive ability, since the user's queries may 
not only elicit more information from a (static, and 
hence well - structured) database, but may cause the 
expert system to perform further reasoning as well, 
and hence require the dynamic explanation of system 
behavior, expert system rules, etc. This application 
also includes issues in text planning, sentence 
planning, and lexical choice. 

> Speech Generation: 

Simplistic text-to-speech synthesis systems have been 
available commercially for a number of years, but 
naturalistic speech generation involves unsolved 
issues in discourse and interpersonal pragmatics (for 
example, the intonation contour of an utterance can 
express dislike, questioning, etc.). Today, only the 
most advanced speech synthesizers compute 
syntactic form as well as intonation contour and pitch 
level. 

> Limited Report and Letter Writing: 

As mentioned in the previous section, with 
increasingly general representations for text 
structure, generator systems will increasingly be able 
to produce standardized multi-paragraph texts such 
as business letters or monthly reports. The problems 
faced here include text plan libraries, sentence 
planning, adequate lexicons, and robust sentence 
generators. 

> Automated document production: 

Such as weather forecasts, simulation reports, letters 
etc. 

> Presentation of information to people in an 
understandable fashion: 

Such as medical records, expert system reasoning etc. 

VII. CASE STUDY: WEATHER FORECAST 

In this case study, we will discuss the specifications of 
a specific NLG system for weather forecasting 
showing the different phases needed to transform 
specifications text into natural output text. Figure 3 
depicts the weather forecast NLG system structure 
[ 10 ]. 



Figure 3 - Weather Forecast NLG System Structure 


A. Specifications 

> Goal: Produce understandable natural texts in 
English to indicate weather situations 

> Input: Special commands or syntax 

representation of information. 

> Output: Report of natural English texts. 

> Knowledge sources required: knowledge of the 
English language and of the domain of weather 

B. Phases 

The Discourse Planner takes as input the language 
commands and generates different chunks of 
information, classified in a tree-like structure which is 
depicts in Figure 4. 



DOCUMENTPLAN 




SATELLITE-01 

[SEQUENCE] 



SATELLITE-02 

[SEQUENCE] 



NUCLEUS 

NUCLEUS 

1 

SATELLITE-01 

[ELABORATION] 

SATELLITE-02 

[ELABORATION] 

cooler than 
average 

drier than 
average 

NUCLEUS 

1 

SATELLITE-01 

[CONTRAST] 

NUCLEUS 

1 

SATELLITE-01 

[CONTRAST] 



average # 
raindays 

NUCLEUS 

rainspell 

NUCLEUS 

| 




rain so tar 


rain 

amounts 


Figure 4 - Discourse Planner Results 


The Surface Realizer takes as input the leaves of 
the tree produced previously and generates single 
grammatically correct natural sentences. 

The month was cooler than average. 

The month was drier than average. 

There were the average numbers of rain days. 

The total rain for the year so far is well below 
average. 

There was rain on every day for 8 days from 11th to 
18 th. 

Rainfall amounts were mostly small. 

The Surface Realizer will process then the above 
sentences and produces a coherent English natural 
text paragraph. 

The month was cooler and drier than average, with 
the average number of rain days , 

but the total rain for the year so far is well below 
average. 

Although there was rain on every day for 8 days from 
11th to 18th, rainfall amounts were mostly small. 
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VIII. EXISTING NLG SYSTEMS 

In this section, we are presenting some of the existing 
NLG systems, taken from the real world. 

A. FoG 

> Function: Produces textual weather reports in 
English and French 

> Input: Graphical/numerical weather depiction 

> User: Environment Canada (Canadian Weather 
Service) 

> Developer: CoGenTex 

> Status: Fielded, in operational use since 1992 

Figure 5 shows the input of FoG; while, Figure 6 
shows its output. 



Figure 5 - FoG Non-Linguistic Input 


FPCN20 Status: CURRENT-NOT RELEASED 

I FPCN20 CWEG 152300 

MARINE FORECASTS FOR ARCTIC WATERS ISSUED BY THE ARCTIC WEATHER CENTRE 
OF ENVIRONMENT CANADA AT 05.00 PM MDT SATURDAY 15 APRIL 1995 FOR TONIGH' 
AND SUNDAY WITH AN OUTLOOK FOR MONDAY. 

THE NEXT SCHEDULED FORECAST WILL BE ISSUED AT 05.00 AM MDT. 

WINDS ARE IN KNOTS. 

FOG IMPLIES VISIBILITY LESS THAN 5/8 NM. 

MIST IMPLIES VISIBILITY 5/8 TO 6 NM. 

! GREAT SLAVE LAKE. 

WINDS LIGHT TONIGHT AND SUNDAY. SNOW ENDING NEAR MIDNIGHT. VISIBILITIES 
NEAR 2 NM IN SNOW. 

OUTLOOK FOR MONDAY... LIGHT WINDS. 

GREAT BEAR LAKE. 

FREEZING SPRAY WARNING ISSUED. 

WINDS EAST 20 TO 25 TONIGHT AND SUNDAY. FREEZING SPRAY. 

| OUTLOOK FOR MONDAY... WINDS EASTERLY 20 TO 25. 

MACKENZIE RIVER FROM MILE 0 TO MILE 100. 

WINDS LIGHT TONIGHT AND SUNDAY. SNOW ENDING THIS EVENING. VISIBILITIES 
NEAR 2 NM IN SNOW. 

OUTLOOK FOR MONDAY... LIGHT WINDS. 

MACKENZIE RIVER FROM MILE 100 TO MILE 300. 

WINDS LIGHT STRENGTHENING TO SOUTHEAST 15 SUNDAY AFTERNOON. SNOW ENDING 
EARLY THIS EVENING. VISIBILITIES NEAR 2 NM IN SNOW. 

OUTLOOK FOR MONDAY... WINDS SOUTHEASTERLY 15. 



Source 

v Working Version 
^ Official Release 
^ Forecast Rollup 


Figure 7 shows the input of STOP; while. Figure 8 
shows its output. 

SMOKING QUESTIONNAIRE 

Please answer by marking the most appropriate box for each question like this: 13 


Q1 Have you smoked a cigarette ill the last week. 

evm a puff? 

YES ’0 

NO □ 

Please complete the following questions 

Please return the questionnaire unanswered in the 
envelope provided. Thank you. 

Please read the questions carefully. If you are no 1 

t sure how to answer, just give the best answer you can. 


Q2 Home situation: 

Live □ 


live with fxl 

lmsb and/wife/p artner 


Live with □ Live with Fxl 
other adults children 


Q3 Number of children under 16 living at home 


Q4 Does anyone else in your household smoke? (If so, please m ark all boxes which apply) 

husband/wife/partner [x] other family member [x] others □ 


Q5 How long have you smoked for? 
Tick here if you have smoked for less tJ 


lotv tong nave you smoKeu lor : . .10... years 

:k here if you have smoked for less than a year 


Figure 7 - STOP Questionnaire 


DfrU ir.Liiii.icu. 

ThifirJi you for ■.xkin.c liu rnnfrLe to reimn ihe urrakic? zpFjvionraLrc ilni vv :-:rc yon 
ll from your ihai ri- hr^'iih y\-"Tf tin- j-ipjr-ji io ■Hop rf-.rkiup in to* 

-run Rtlura. yi.Hi mI|J Kcri In rtcip i T il -h-t--, Yi.ui -ilunt. il u-gulil In Jiifiyyil bci 
I l-jlii.-l' jwrs.-ivij y-i'.'.' ivly-v iiJJi'i -.('ri-ii if j i m w.I il U ik i\V. " i'imiT ■.■■■>.■. r.+K 

i.viL'l.iii.h: JfJfec. fvU pit f.'.Vi# art i.ifl.i'V. HiiM.fi ex. Vim IliVt riliini. LO- bt 
j.r. I i 11 l-: : I i:f MiuuhH if \iu ilni lit hi *ilii]i leiI Mi.-ij hi : i hnl' lvjmiiu iuISi iHl* 


c. 

> 


Figure 8 - STOP Natural Output 

Loughaty 

Function: Generator of natural programming 
instructions [11] 

Input: Template wizards, you fill in to generate 
programming instructions 

Usage: Learning the basic concepts of 

programming 


Figure 9 shows the input of Loughaty; while, Figure 
10 shows its output. 



Figure 9 - Loughaty's Fill-in Template 


Kpff 


Figure 6 - FoG Natural Text Output 

B. STOP System 

> Function: Produces a personalized smoking- 
cessation leaflet 

> Input: Questionnaire about smoking attitudes, 
beliefs, history 

> User: NHS (British Health Service) 

> Developer: University of Aberdeen 



Figure 10 - Loughaty's Generated Natural 
Instructions 
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